The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.
translated by 谷歌翻译
最近的神经生成系统已经证明了程序性生成游戏内容,图像,故事等的潜力。但是,大多数神经生成算法是“不受控制的”,因为用户在最初的及时规范之外的创意决策中几乎没有发言权。共同创造性的混合定位系统需要以用户为中心的影响算法,尤其是当用户不太可能拥有机器学习专业知识时。共同创造系统的关键是能够从用户到代理以及从代理到用户传达想法和意图的能力。共同创造的AI中的关键问题包括:用户如何表达自己的创造意图? Creative AI系统如何传达他们的信念,解释他们的举动或指示用户代表他们采取行动? Creative AI系统何时应该采取主动?此类问题的答案以及更多的答案将使我们能够开发出更好的共同创造系统,从而使人类更有能力表达自己的创造意图。我们介绍了Creative-Wand,这是一个可定制的框架,用于调查共同创造的混合发电生成。 Creative-Wand可以将生成模型和人类代理通信渠道的插入式注射到基于聊天的接口中。它提供了许多维度,在共同创造过程中,AI发生器和人类可以进行交流。我们通过使用该框架来研究共同创造性通信全球广播的一个维度与本地创意意图通过讲故事的上下文来说明创意范围的框架。
translated by 谷歌翻译
CNN表现出与人类不同的许多行为,其中之一是采用高频组件的能力。本文讨论了图像分类任务中的频率偏差现象:高频组件实际上比低频和中频组件的利用要少得多。我们首先通过提出有关特征歧视和学习优先级的两个观察结果来研究频率偏差现象。此外,我们假设(i)光谱密度,(ii)类一致性直接影响频率偏差。具体而言,我们的研究验证数据集的光谱密度主要影响学习优先级,而课程一致性主要影响特征歧视。
translated by 谷歌翻译
在传统的对象检测框架中,从图像识别模型继承的骨干体提取了深层特征,然后颈部模块融合了这些潜在特征,以在不同的尺度上捕获信息。由于对象检测的分辨率比图像识别大得多,因此骨干的计算成本通常主导了总推断成本。这种沉重的背部设计范式主要是由于历史遗产将图像识别模型传输到对象检测时,而不是端到端的优化设计以进行对象检测。在这项工作中,我们表明这种范式确实导致了亚最佳对象检测模型。为此,我们提出了一种新型的重颈范式,长颈鹿,这是一个类似长颈鹿的网络,用于有效的对象检测。长颈鹿使用极轻的骨干和非常深的颈部模块,可同时同时在不同的空间尺度以及不同级别的潜在语义之间进行密集的信息交换。该设计范式允许检测器即使在网络的早期阶段,也可以在相同的优先级处理高级语义信息和低级空间信息,从而使其在检测任务中更有效。对多个流行对象检测基准的数值评估表明,长颈鹿在广泛的资源约束中始终优于先前的SOTA模型。源代码可在https://github.com/jyqi/giraffedet上获得。
translated by 谷歌翻译
在对象检测模型中,检测骨干机消耗超过一半的整体推理成本。最近的研究试图通过在神经结构搜索(NAS)的帮助下优化骨干架构来降低这一成本。然而,对象检测的现有NAS方法需要数百至数千个GPU小时的搜索,使它们在快节奏的研究和开发中不切实际。在这项工作中,我们提出了一种新的零射NAS方法来解决这个问题。所提出的方法,命名为Zendet,在不训练网络参数的情况下自动设计有效的检测骨干网,从而降低了架构设计成本,几乎归零但提供了最先进的(SOTA)性能。在引擎盖下,Zendet最大化了检测骨干的差分熵,导致对象检测的更好的特征提取器,在相同的计算预算下。在仅为全自动设计的一个GPU日之后,Zendet在多个检测基准数据集上创新了SOTA检测骨干,具有很少的人为干预。与Reset-50个骨干相比,Zendet在Map中使用相同数量的拖波/参数时更好地+ 2.0%,并且在同一地图上的NVIDIA V100速度快1.54倍。稍后将发布代码和预先训练的型号。
translated by 谷歌翻译
尽管在许多领域都有成功的应用,但如今的机器学习模型遭受了臭名昭著的问题,例如脆弱性,对对抗性例子。除了陷入对抗攻击和防御之间的猫与小鼠游戏之外,本文还提供了替代观点来考虑对抗性示例,并探索我们是否可以在良性应用中利用它。我们首先将对抗性示例归因于使用非语义特征的人类模型差异。尽管在经典的机器学习机制中很大程度上被忽略了,但非语义功能具有三个有趣的特征,因为(1)模型独有,(2)对推理至关重要,以及(3)可利用的功能。受到这一点的启发,我们提出了良性的对抗性攻击的新想法,以利用三个方向的对抗性示例以善良:(1)对抗性图灵测试,(2)拒绝恶意模型应用,以及(3)对抗性数据扩增。每个方向都以动机详细说明,理由分析和原型应用来展示其潜力。
translated by 谷歌翻译
在近期深度图像压缩神经网络中,熵模型在估计深度图像编码的先前分配时起着重要作用。现有方法将HydupRior与熵估计功能中的本地上下文组合。由于没有全球愿景,这大大限制了他们的表现。在这项工作中,我们提出了一种新的全局参考模型,用于图像压缩,以有效地利用本地和全局上下文信息,导致增强的压缩率。所提出的方法扫描解码的潜伏,然后找到最相关的潜伏,以帮助分布估计当前潜伏。这项工作的副产品是一种平均转换GDN模块的创新,进一步提高了性能。实验结果表明,所提出的模型优于行业中大多数最先进方法的速率变形性能。
translated by 谷歌翻译
Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. Concretely, a behavior generation module first produces a diverse set of candidate behaviors in the form of trajectory proposals. Then the proposed conditional motion prediction network is employed to forecast other agents' future trajectories conditioned on each trajectory proposal. Given the candidate plans and associated prediction results, we learn a scoring module to evaluate the plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results reveal that the conditional prediction model is able to forecast multiple possible future trajectories given a candidate behavior and the prediction results are reactive to different plans. Moreover, the IRL-based scoring module can properly evaluate the trajectory proposals and select close-to-human ones. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Moreover, we find that the conditional prediction model can improve both prediction and planning performance compared to the non-conditional model, and learning the scoring module is critical to correctly evaluating the candidate plans to align with human drivers.
translated by 谷歌翻译
The foundation models have recently shown excellent performance on a variety of downstream tasks in computer vision. However, most existing vision foundation models simply focus on image-level pretraining and adpation, which are limited for dynamic and complex video-level understanding tasks. To fill the gap, we present general video foundation models, InternVideo, by taking advantage of both generative and discriminative self-supervised video learning. Specifically, InternVideo efficiently explores masked video modeling and video-language contrastive learning as the pretraining objectives, and selectively coordinates video representations of these two complementary frameworks in a learnable manner to boost various video applications. Without bells and whistles, InternVideo achieves state-of-the-art performance on 39 video datasets from extensive tasks including video action recognition/detection, video-language alignment, and open-world video applications. Especially, our methods can obtain 91.1% and 77.2% top-1 accuracy on the challenging Kinetics-400 and Something-Something V2 benchmarks, respectively. All of these results effectively show the generality of our InternVideo for video understanding. The code will be released at https://github.com/OpenGVLab/InternVideo .
translated by 谷歌翻译
Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients.
translated by 谷歌翻译